Baseball is the most data-driven sport in the world today and has a rich recorded history of statistics. Sabermetrics to aid managerial decision-making has exploded in recent years and the methods used to evaluate players have become increasingly complex, making it a fascinating game to explore. The goal of any franchise’s front office should be to minimize performance volatility and maximize wins. While baseball far transcends dollar signs and analytics for fans, it is nonetheless a business that must be treated as such to effectively generate winning seasons. Among other things, our report sheds light on responsible benchmarks for player salaries, the richest recruiting epicenters for MLB scouts, and some of the stats most critical for a campaign that will endure through October. The data used to support our analysis is drawn from the Sean Lahman online database, a repository featuring an array of .csv files with records across two centuries.
We will be analyzing 2017’s league MVP’s in this summary. In the plots below, we examine how important colleges and salaries are important for a team’s success. Here, we ask how important they are to a player’s performance.
Last year American League MVP was Jose Altuve and he made $3687500 in 2016. In comparison, National League MVP, Giancarlo Stanton, made $9000000 in 2016. Surprisingly, none of them went to college in U.S. (as indicted by NA value while analyzing the college players). We will be analyzing more of similar concepts in the final deliverable.
A leading objective of our analysis is to identify the most impactful statistics in the game and determine performance threshholds that can predict certain degrees of success, both on an individual and team level. Our insights mimic the research scouts and front office employees use to make major budgeting decisions across the league, and the following table is an excellent example of precedence influencing this management. Below is a list of World Series champions since 2000 and their corresponding hitting, pitching, and fielding statistics.
Table 1: World Series Winners Since 2000
| Year | Team | W | L | RS | RA | DIFF | BA | OBP | SLG | OPS | ERA | WHIP | SOA | FP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | New York Yankees | 87 | 74 | 871 | 814 | +57 | 0.277 | 0.354 | 0.450 | 0.804 | 4.76 | 1.43 | 1040 | 0.981 |
| 2001 | Arizona Diamondbacks | 92 | 70 | 818 | 677 | +141 | 0.267 | 0.341 | 0.442 | 0.783 | 3.87 | 1.24 | 1297 | 0.986 |
| 2002 | Los Angeles Angels of Anaheim | 99 | 63 | 851 | 644 | +207 | 0.282 | 0.341 | 0.433 | 0.774 | 3.69 | 1.28 | 999 | 0.986 |
| 2003 | Florida Marlins | 91 | 71 | 751 | 692 | +59 | 0.266 | 0.333 | 0.421 | 0.754 | 4.04 | 1.35 | 1132 | 0.987 |
| 2004 | Boston Red Sox | 98 | 64 | 949 | 768 | +181 | 0.282 | 0.360 | 0.472 | 0.832 | 4.18 | 1.29 | 1132 | 0.981 |
| 2005 | Chicago White Sox | 99 | 63 | 741 | 645 | +96 | 0.262 | 0.322 | 0.425 | 0.747 | 3.61 | 1.25 | 1040 | 0.985 |
| 2006 | St. Louis Cardinals | 83 | 78 | 781 | 762 | +19 | 0.269 | 0.337 | 0.431 | 0.768 | 4.54 | 1.38 | 970 | 0.984 |
| 2007 | Boston Red Sox | 96 | 66 | 867 | 657 | +210 | 0.279 | 0.362 | 0.444 | 0.806 | 3.87 | 1.27 | 1149 | 0.986 |
| 2008 | Philadelphia Phillies | 92 | 70 | 799 | 680 | +119 | 0.255 | 0.332 | 0.438 | 0.770 | 3.88 | 1.36 | 1081 | 0.985 |
| 2009 | New York Yankees | 103 | 59 | 915 | 753 | +162 | 0.283 | 0.362 | 0.478 | 0.840 | 4.26 | 1.35 | 1260 | 0.985 |
| 2010 | San Francisco Giants | 92 | 70 | 697 | 583 | +114 | 0.257 | 0.321 | 0.408 | 0.729 | 3.36 | 1.27 | 1331 | 0.988 |
| 2011 | St. Louis Cardinals | 90 | 72 | 762 | 692 | +70 | 0.273 | 0.341 | 0.425 | 0.766 | 3.74 | 1.31 | 1098 | 0.982 |
| 2012 | San Francisco Giants | 94 | 68 | 718 | 649 | +69 | 0.269 | 0.327 | 0.397 | 0.724 | 3.68 | 1.27 | 1237 | 0.981 |
| 2013 | Boston Red Sox | 97 | 65 | 853 | 656 | +197 | 0.277 | 0.349 | 0.446 | 0.795 | 3.79 | 1.30 | 1294 | 0.987 |
| 2014 | San Francisco Giants | 88 | 74 | 665 | 614 | +51 | 0.255 | 0.311 | 0.388 | 0.699 | 3.50 | 1.17 | 1211 | 0.984 |
| 2015 | Kansas City Royals | 95 | 67 | 724 | 641 | +83 | 0.269 | 0.322 | 0.412 | 0.734 | 3.73 | 1.28 | 1160 | 0.985 |
| 2016 | Chicago Cubs | 103 | 58 | 808 | 556 | +252 | 0.256 | 0.343 | 0.429 | 0.772 | 3.15 | 1.11 | 1441 | 0.983 |
| 2017 | Houston Astros | 101 | 61 | 896 | 700 | +196 | 0.282 | 0.346 | 0.478 | 0.824 | 4.12 | 1.27 | 1593 | 0.983 |
| 2018 | Boston Red Sox | 108 | 54 | 876 | 647 | +229 | 0.268 | 0.339 | 0.453 | 0.792 | 3.75 | 1.25 | 1558 | 0.987 |
In this section, we will be analyzing all-star players since 2000 and the cities that they went to college for. The purpose of the chart below is to help the recruiters to scout talents in more “productive” cities.
Both size and color of the dots encode the same parameter: the total number of all-stars produced. Regions such as SEC and ACC seem to produce more talents compared to other regions. California is also a notable candidate in that it has produced the most number of all-stars state-wise.
Fig. 1: Visualization of All-Star Talents by CitiesThis plot is important in our project since scouting is a big part of a team’s future success. By knowing where to scout, the budget can be better spent for maximum efficiency.
In the final product, we are hoping to make the year selectable, add more options such as batting averages, home runs, etc., and modify the map to give the states-summary.
This scatter plot is to describe the relationship between players’ salary and their batting average(this is hit divided by At bat). Note that only top 100 highest-paid players in MLB from 2016 season are shown.
From this chart, we can see that Daniel Murphy was one of the most underrated players in MLB with an annual salary of $8 million. His BA was about 0.35 in 2016 season.
Fig. 3: Visualization of distribution of baseball player salaries in Major League Baseball for the year 2016.
As with any sport, baseball has some top players that are valued higher than others. It is interesting to see how this is visualized the distribution of player salaries. This plot is heavily grouped in the bottom, visualizing that most players made less than $10 million that year in salary. On the top end of the spectrum, the highest earners made upwards of $30 million in salary during 2016. There is clearly a wide range of player salaries in the year 2016, a few players made significantly more in salary than the majority of other players.